Relative-Error CUR Matrix Decompositions

نویسندگان

  • Petros Drineas
  • Michael W. Mahoney
  • S. Muthukrishnan
چکیده

Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an m×n matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let A′ = CC+A, where C+ is the Moore–Penrose generalized inverse of C. In our second algorithm C, U , R are chosen, and we let A′ = CUR. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least 1 − δ, ‖A − A′‖F ≤ (1 + ) ‖A − Ak‖F , where Ak is the “best” rank-k approximation provided by truncating the SVD of A, and where ‖X‖F is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a low-degree polynomial in k, 1/ , and log(1/δ). Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants of these matrix decompositions over the last ten years. However, our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist. Both of our algorithms are simple and they take time of the order needed to approximately compute the top k singular vectors of A. The technical crux of our analysis is a novel, intuitive sampling method we introduce in this paper called “subspace sampling.” In subspace sampling, the sampling probabilities depend on the Euclidean norms of the rows of the top singular vectors. This allows us to obtain provable relative-error guarantees by deconvoluting “subspace” information and “size-of-A” information in the input matrix. This technique is likely to be useful for other matrix approximation and data analysis problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CUR matrix decompositions for improved data analysis.

Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes gene...

متن کامل

A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

The CUR matrix decomposition is an important extension of Nyström approximation to a general matrix. It approximates any data matrix in terms of a small number of its columns and rows. In this paper we propose a novel randomized CUR algorithm with an expected relative-error bound. The proposed algorithm has the advantages over the existing relative-error CUR algorithms that it possesses tighter...

متن کامل

Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods

Much recent work in the theoretical computer science, linear algebra, and machine learning has considered matrix decompositions of the following form: given an m×n matrix A, decompose it as a product of three matrices, C, U , and R, where C consists of a small number of columns of A, R consists of a small number of rows of A, and U is a small carefully constructed matrix that guarantees that th...

متن کامل

Spectral Gap Error Bounds for Improving CUR Matrix Decomposition and the Nyström Method

The CUR matrix decomposition and the related Nyström method build low-rank approximations of data matrices by selecting a small number of representative rows and columns of the data. Here, we introduce novel spectral gap error bounds that judiciously exploit the potentially rapid spectrum decay in the input matrix, a most common occurrence in machine learning and data analysis. Our error bounds...

متن کامل

ar X iv : 0 70 8 . 36 96 v 1 [ cs . D S ] 2 7 A ug 2 00 7 Relative - Error CUR Matrix Decompositions ∗

Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly express...

متن کامل

Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling

The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • SIAM J. Matrix Analysis Applications

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2008